TOPNMF: Topic based Document Clustering using Non-negative Matrix Factorization
نویسندگان
چکیده
Objectives: This work focuses on creating targeted content-specific topicbased clusters. They can help users to discover the topics in a set of documents information more efficiently. Methods/Statistical analysis: The Non-negative Matrix Factorization (NMF) based models learn by directly decomposing term-document matrix, which is bag-of-word matrix representation text corpus, into two low-rank factor matrices namely Word-Topic feature Matrix(WTOM) and Document-Topic Matrix(DTOM). Topic clusters Document are extracted from obtained features matrices. method does not require any statistical distribution probability. Experiments were carried out subset BBC sport Corpus. Findings: experimental results indicate that accuracy TONMF was observed as 100 percent. Novelty/Applications: NMF often fails improve given clustering result number parameters increases linearly with size corpus. computational complexity TOPNMF better than exact decomposition like Singular Value Decomposition (SVD). Keywords: cluster; factorization; K-means clustering; Word cloud
منابع مشابه
Clinical Document Clustering using Multi-view Non-Negative Matrix Factorization
Clinical document contains vital information like symptom names, medication names, age, gender and some demographical information. These information can be used for giving quick relief from a disease. In existing system, they had built a system for clustering symptom names and medication names using Multi-View Non-Negative Matrix Factorization. While considering the clinical documents the facto...
متن کاملDocument Clustering Based On Max-Correntropy Non-Negative Matrix Factorization
Nonnegative matrix factorization (NMF) has been successfully applied to many areas for classification and clustering. Commonly-used NMF algorithms mainly target on minimizing the l2 distance or Kullback-Leibler (KL) divergence, which may not be suitable for nonlinear case. In this paper, we propose a new decomposition method by maximizing the correntropy between the original and the product of ...
متن کاملParallel Non Negative Matrix Factorization for Document Clustering
Non-negative matrix factorization has been used as an effective approach for document clustering lately. One advantage of this method is that clustering results can be directly concluded from the factor matrices. This project gives parallel implementation of three algorithms for Non-negative matrix factorization. Experiments of these parallel algorithms for large datasets shows good speedup for...
متن کاملTopic supervised non-negative matrix factorization
Topic models have been extensively used to organize and interpret the contents of large, unstructured corpora of text documents. Although topic models often perform well on traditional training vs. test set evaluations, it is often the case that the results of a topic model do not align with human interpretation. This interpretability fallacy is largely due to the unsupervised nature of topic m...
متن کاملDocument Clustering Based on Spectral Clustering and Non-negative Matrix Factorization
In this paper, we propose a novel non-negative matrix factorization (NMF) to the affinity matrix for document clustering, which enforces nonnegativity and orthogonality constraints simultaneously. With the help of orthogonality constraints, this NMF provides a solution to spectral clustering, which inherits the advantages of spectral clustering and presents a much more reasonable clustering int...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Indian journal of science and technology
سال: 2021
ISSN: ['0974-5645', '0974-6846']
DOI: https://doi.org/10.17485/ijst/v14i31.1293